SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.
Identifieur interne : 000645 ( Main/Exploration ); précédent : 000644; suivant : 000646SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.
Auteurs : Mourad Sarrouti [États-Unis] ; Said Ouatik El Alaoui [Maroc]Source :
- Artificial intelligence in medicine [ 1873-2860 ] ; 2020.
Descripteurs français
- KwdFr :
- Algorithmes (MeSH), Apprentissage machine (MeSH), Automatisation (MeSH), Humains (MeSH), Informatique médicale (méthodes), Mémorisation et recherche des informations (MeSH), PubMed (MeSH), Technologie biomédicale (méthodes), Traitement du langage naturel (MeSH), Unified medical language system (USA) (MeSH).
- MESH :
English descriptors
- KwdEn :
- MESH :
Abstract
BACKGROUND AND OBJECTIVE
Question answering (QA), the identification of short accurate answers to users questions written in natural language expressions, is a longstanding issue widely studied over the last decades in the open-domain. However, it still remains a real challenge in the biomedical domain as the most of the existing systems support a limited amount of question and answer types as well as still require further efforts in order to improve their performance in terms of precision for the supported questions. Here, we present a semantic biomedical QA system named SemBioNLQA which has the ability to handle the kinds of yes/no, factoid, list, and summary natural language questions.
METHODS
This paper describes the system architecture and an evaluation of the developed end-to-end biomedical QA system named SemBioNLQA, which consists of question classification, document retrieval, passage retrieval and answer extraction modules. It takes natural language questions as input, and outputs both short precise answers and summaries as results. The SemBioNLQA system, dealing with four types of questions, is based on (1) handcrafted lexico-syntactic patterns and a machine learning algorithm for question classification, (2) PubMed search engine and UMLS similarity for document retrieval, (3) the BM25 model, stemmed words and UMLS concepts for passage retrieval, and (4) UMLS metathesaurus, BioPortal synonyms, sentiment analysis and term frequency metric for answer extraction.
RESULTS AND CONCLUSION
Compared with the current state-of-the-art biomedical QA systems, SemBioNLQA, a fully automated system, has the potential to deal with a large amount of question and answer types. SemBioNLQA retrieves quickly users' information needs by returning exact answers (e.g., "yes", "no", a biomedical entity name, etc.) and ideal answers (i.e., paragraph-sized summaries of relevant information) for yes/no, factoid and list questions, whereas it provides only the ideal answers for summary questions. Moreover, experimental evaluations performed on biomedical questions and answers provided by the BioASQ challenge especially in 2015, 2016 and 2017 (as part of our participation), show that SemBioNLQA achieves good performances compared with the most current state-of-the-art systems and allows a practical and competitive alternative to help information seekers find exact and ideal answers to their biomedical questions. The SemBioNLQA source code is publicly available at https://github.com/sarrouti/sembionlqa.
DOI: 10.1016/j.artmed.2019.101767
PubMed: 31980104
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PubMed, to step Corpus: 000174
- to stream PubMed, to step Curation: 000173
- to stream PubMed, to step Checkpoint: 000093
- to stream Main, to step Merge: 000645
- to stream Main, to step Curation: 000645
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.</title>
<author><name sortKey="Sarrouti, Mourad" sort="Sarrouti, Mourad" uniqKey="Sarrouti M" first="Mourad" last="Sarrouti">Mourad Sarrouti</name>
<affiliation wicri:level="2"><nlm:affiliation>Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda, MD. Electronic address: sarrouti.mourad@gmail.com.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Maryland</region>
</placeName>
<wicri:cityArea>Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Ouatik El Alaoui, Said" sort="Ouatik El Alaoui, Said" uniqKey="Ouatik El Alaoui S" first="Said" last="Ouatik El Alaoui">Said Ouatik El Alaoui</name>
<affiliation wicri:level="1"><nlm:affiliation>National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco; Laboratory of Informatics and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco.</nlm:affiliation>
<country xml:lang="fr">Maroc</country>
<wicri:regionArea>National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco; Laboratory of Informatics and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez</wicri:regionArea>
<wicri:noRegion>Fez</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">PubMed</idno>
<date when="2020">2020</date>
<idno type="RBID">pubmed:31980104</idno>
<idno type="pmid">31980104</idno>
<idno type="doi">10.1016/j.artmed.2019.101767</idno>
<idno type="wicri:Area/PubMed/Corpus">000174</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000174</idno>
<idno type="wicri:Area/PubMed/Curation">000173</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000173</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000093</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000093</idno>
<idno type="wicri:Area/Main/Merge">000645</idno>
<idno type="wicri:Area/Main/Curation">000645</idno>
<idno type="wicri:Area/Main/Exploration">000645</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.</title>
<author><name sortKey="Sarrouti, Mourad" sort="Sarrouti, Mourad" uniqKey="Sarrouti M" first="Mourad" last="Sarrouti">Mourad Sarrouti</name>
<affiliation wicri:level="2"><nlm:affiliation>Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda, MD. Electronic address: sarrouti.mourad@gmail.com.</nlm:affiliation>
<country xml:lang="fr">États-Unis</country>
<placeName><region type="state">Maryland</region>
</placeName>
<wicri:cityArea>Lister Hill National Center for Biomedical Communications, U.S. National Library of Medicine, U.S. National Institutes of Health, Bethesda</wicri:cityArea>
</affiliation>
</author>
<author><name sortKey="Ouatik El Alaoui, Said" sort="Ouatik El Alaoui, Said" uniqKey="Ouatik El Alaoui S" first="Said" last="Ouatik El Alaoui">Said Ouatik El Alaoui</name>
<affiliation wicri:level="1"><nlm:affiliation>National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco; Laboratory of Informatics and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez, Morocco.</nlm:affiliation>
<country xml:lang="fr">Maroc</country>
<wicri:regionArea>National School of Applied Sciences, Ibn Tofail University, Kenitra, Morocco; Laboratory of Informatics and Modeling, FSDM, Sidi Mohammed Ben Abdellah University, Fez</wicri:regionArea>
<wicri:noRegion>Fez</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j">Artificial intelligence in medicine</title>
<idno type="eISSN">1873-2860</idno>
<imprint><date when="2020" type="published">2020</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Algorithms (MeSH)</term>
<term>Automation (MeSH)</term>
<term>Biomedical Technology (methods)</term>
<term>Humans (MeSH)</term>
<term>Information Storage and Retrieval (MeSH)</term>
<term>Machine Learning (MeSH)</term>
<term>Medical Informatics (methods)</term>
<term>Natural Language Processing (MeSH)</term>
<term>PubMed (MeSH)</term>
<term>Unified Medical Language System (MeSH)</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr"><term>Algorithmes (MeSH)</term>
<term>Apprentissage machine (MeSH)</term>
<term>Automatisation (MeSH)</term>
<term>Humains (MeSH)</term>
<term>Informatique médicale (méthodes)</term>
<term>Mémorisation et recherche des informations (MeSH)</term>
<term>PubMed (MeSH)</term>
<term>Technologie biomédicale (méthodes)</term>
<term>Traitement du langage naturel (MeSH)</term>
<term>Unified medical language system (USA) (MeSH)</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en"><term>Biomedical Technology</term>
<term>Medical Informatics</term>
</keywords>
<keywords scheme="MESH" qualifier="méthodes" xml:lang="fr"><term>Informatique médicale</term>
<term>Technologie biomédicale</term>
</keywords>
<keywords scheme="MESH" xml:lang="en"><term>Algorithms</term>
<term>Automation</term>
<term>Humans</term>
<term>Information Storage and Retrieval</term>
<term>Machine Learning</term>
<term>Natural Language Processing</term>
<term>PubMed</term>
<term>Unified Medical Language System</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr"><term>Algorithmes</term>
<term>Apprentissage machine</term>
<term>Automatisation</term>
<term>Humains</term>
<term>Mémorisation et recherche des informations</term>
<term>PubMed</term>
<term>Traitement du langage naturel</term>
<term>Unified medical language system (USA)</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en"><p><b>BACKGROUND AND OBJECTIVE</b>
</p>
<p>Question answering (QA), the identification of short accurate answers to users questions written in natural language expressions, is a longstanding issue widely studied over the last decades in the open-domain. However, it still remains a real challenge in the biomedical domain as the most of the existing systems support a limited amount of question and answer types as well as still require further efforts in order to improve their performance in terms of precision for the supported questions. Here, we present a semantic biomedical QA system named SemBioNLQA which has the ability to handle the kinds of yes/no, factoid, list, and summary natural language questions.</p>
</div>
<div type="abstract" xml:lang="en"><p><b>METHODS</b>
</p>
<p>This paper describes the system architecture and an evaluation of the developed end-to-end biomedical QA system named SemBioNLQA, which consists of question classification, document retrieval, passage retrieval and answer extraction modules. It takes natural language questions as input, and outputs both short precise answers and summaries as results. The SemBioNLQA system, dealing with four types of questions, is based on (1) handcrafted lexico-syntactic patterns and a machine learning algorithm for question classification, (2) PubMed search engine and UMLS similarity for document retrieval, (3) the BM25 model, stemmed words and UMLS concepts for passage retrieval, and (4) UMLS metathesaurus, BioPortal synonyms, sentiment analysis and term frequency metric for answer extraction.</p>
</div>
<div type="abstract" xml:lang="en"><p><b>RESULTS AND CONCLUSION</b>
</p>
<p>Compared with the current state-of-the-art biomedical QA systems, SemBioNLQA, a fully automated system, has the potential to deal with a large amount of question and answer types. SemBioNLQA retrieves quickly users' information needs by returning exact answers (e.g., "yes", "no", a biomedical entity name, etc.) and ideal answers (i.e., paragraph-sized summaries of relevant information) for yes/no, factoid and list questions, whereas it provides only the ideal answers for summary questions. Moreover, experimental evaluations performed on biomedical questions and answers provided by the BioASQ challenge especially in 2015, 2016 and 2017 (as part of our participation), show that SemBioNLQA achieves good performances compared with the most current state-of-the-art systems and allows a practical and competitive alternative to help information seekers find exact and ideal answers to their biomedical questions. The SemBioNLQA source code is publicly available at https://github.com/sarrouti/sembionlqa.</p>
</div>
</front>
</TEI>
<affiliations><list><country><li>Maroc</li>
<li>États-Unis</li>
</country>
<region><li>Maryland</li>
</region>
</list>
<tree><country name="États-Unis"><region name="Maryland"><name sortKey="Sarrouti, Mourad" sort="Sarrouti, Mourad" uniqKey="Sarrouti M" first="Mourad" last="Sarrouti">Mourad Sarrouti</name>
</region>
</country>
<country name="Maroc"><noRegion><name sortKey="Ouatik El Alaoui, Said" sort="Ouatik El Alaoui, Said" uniqKey="Ouatik El Alaoui S" first="Said" last="Ouatik El Alaoui">Said Ouatik El Alaoui</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Sante/explor/MaghrebDataLibMedV2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000645 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000645 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Sante |area= MaghrebDataLibMedV2 |flux= Main |étape= Exploration |type= RBID |clé= pubmed:31980104 |texte= SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions. }}
Pour générer des pages wiki
HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i -Sk "pubmed:31980104" \ | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd \ | NlmPubMed2Wicri -a MaghrebDataLibMedV2
This area was generated with Dilib version V0.6.38. |